Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs

نویسندگان

Eugenio Della Vecchia

Silvia Di Marco

Alain Jean-Marie

چکیده

This paper is concerned with the links between the Value Iteration algorithm and the Rolling Horizon procedure, for solving problems of stochastic optimal control under the long-run average criterion, in Markov Decision Processes with finite state and action spaces. We review conditions of the literature which imply the geometric convergence of Value Iteration to the optimal value. Aperiodicity is an essential prerequisite for convergence. We prove that the convergence of Value Iteration generally implies that of Rolling Horizon. We also present a modified Rolling Horizon procedure that can be applied to models without analyzing periodicity, and discuss the impact of this transformation on convergence. We illustrate with numerous examples the different convergence results. Key-words: Markov decision problems, Value iteration, Heuristic methods, Rolling horizon. ∗ CONICET UNR, Argentina † CONICET UNR, Argentina ‡ INRIA and LIRMM, CNRS/Université Montpellier 2, 161 Rue Ada, F-34392 Montpellier, [email protected]. in ria -0 06 17 27 1, v er si on 1 26 A ug 2 01 1 Une revue illustrée des conditions de convergence pour l’algorithme d’itération de valeur et la procédure de l’horizon roulant, pour les processus de décision Markoviens en coût moyen Résumé : Nous nous intéressons aux relations entre l’algorithme d’itération de valeurs et la procédure de l’horizon roulant, pour résoudre les problèmes de contrôle optimal stochastique Markovien sous le critre du coût moyen, dans le cas d’espaces d’états et d’actions finis. Nous passons en revue des conditions issues de la littérature qui impliquent la convergence géométrique de l’itération de valeurs vers la valeur optimale. L’apériodicité du modèle est un pré-requis essentiel. Nous montrons que la convergence de l’itération de valeurs implique de façon générale celle de l’horizon roulant. Nous présentons également une procédure modifiée d’horizon roulant qui peut être appliquée sans avoir besoin d’analyser l’apériodicité, et nous étudions l’impact de cette transformation sur la convergence. Nous illustrons les différents résultats avec de nombreux exemples. Mots-clés : Processus de décision Markovien, itération de valeurs, méthodes heuristiques, horizon roulant. in ria -0 06 17 27 1, v er si on 1 26 A ug 2 01 1 Convergence conditions for VI and RH in MDPs 3

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uniform Convergence of Value Iteration Policies for Discounted Markov Decision Processes

This paper deals with infinite horizon Markov Decision Processes (MDPs) on Borel spaces. The objective function considered, induced by a nonnegative and (possibly) unbounded cost, is the expected total discounted cost. For each of theMDPs analized, the existence of a unique optimal policy is assumed. Conditions that guarantee both pointwise and uniform convergence on compact sets of the minimiz...

متن کامل

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies

This paper studies convergence properties of optimal values and actions for discounted and averagecost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Asymptotic properties of constrained Markov Decision Processes

We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of nite horizon MDPs to the innnite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with innnite state space,...

متن کامل

Proof of Convergence for Evolutionary Policy Iteration under a Sampling Regime

This article extends the evolutionary policy selection algorithm of Chang et al. (2005, 2007), which was designed for use in infinite horizon Markov decision processes (MDPs) with a large action space to a discrete stochastic optimization problem, in an algorithm called Evolutionary Policy Iteration-Monte Carlo (EPI-MC). EPI-MC allows EPI to be used in a setting with a finite decision (action) ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Annals OR

دوره 199 شماره

صفحات -

تاریخ انتشار 2012

Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs

نویسندگان

چکیده

منابع مشابه

Uniform Convergence of Value Iteration Policies for Discounted Markov Decision Processes

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies

Accelerated decomposition techniques for large discounted Markov decision processes

Asymptotic properties of constrained Markov Decision Processes

Proof of Convergence for Evolutionary Policy Iteration under a Sampling Regime

عنوان ژورنال:

اشتراک گذاری